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Abstract 



In this paper, we consider a class of sensor networks 
where the data is not required in real-time by an ob- 
server; for example, a sensor network monitoring a 
scientific phenomenon for later play back and anal- 
ysis. In such networks, the data must be stored in 
the network. Thus, in addition to battery power, stor- 
age is a primary resource: the useful lifetime of the 
network is constrained by its ability to store the gen- 
erated data samples. We explore the use of collab- 
orative storage technique to efficiently manage data 
in storage constrained sensor networks. The pro- 
posed collaborative storage technique takes advan- 
tage of spatial correlation among the data collected 
by nearby sensors to significantly reduce the size of 
the data neai^ the data sources. We show that the pro- 
posed approach provides significant savings in the 
size of the stored data vs. local buffering, allowing 
the network to run for a longer time without running 
out of storage space and reducing the amount of data 
that will eventually be relayed to the observer. In ad- 
dition, collaborative storage performs load balancing 
of the available storage space if data generation rates 
are not uniform across sensors (as would be the case 
in an event driven sensor network), or if the available 
storage varies across the network. 



1 Introduction 

Wireless Sensor Networks (WSNs) hold the promise 
of revolutionizing sensing across a range of civil, sci- 
entific, military and industrial applications. How- 
ever, many battery-operated sensors have constraints 
such as limited energy, computational ability, and 
storage capacity, and thus protocols must be de- 
signed to deal efficiently with these limited re- 
sources. 

In this paper, we consider a class of sensor net- 
works where the information collected by the sensors 
is not collected in real-time. In such applications, the 
data must be stored, at least temporarily, within the 
network until it is later collected by an observer (or 
until it ceases to be useful). Such applications in- 
clude scientific monitoring: the sensors are deployed 
to collect detailed information about a phenomenon 
for later playback and analysis. In addition, some ap- 
plications have sensors which collect data that may 
be needed by users of the networks that generate 
queries dynamically. In such applications, the data 
must be stored in the network; storage becomes a 
primary resource which, in addition to energy, de- 
termines the useful lifetime of the network. This pa- 
per considers the problem of storage management in 
such networks: how to use limited persistent storage 
of a sensor to store sampled data effectively. In addi- 
tion to the applications above, storage can be used to 
tolerate temporary network partitioning, where the 
observer is not reachable from the partitioned sen- 
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sors, without losing potentially valuable data. 

One basic storage management approach is to 
buffer the data locally at the sensors that collect 
them. However, such an approach does not capitalize 
on the spatial correlation of data among neighbor- 
ing sensors to reduce the overall size of the stored 
data (the property that makes data aggregation possi- 
ble Q). Collaborative storage management can pro- 
vide the following advantages over a simple buffer- 
ing technique: (1) More efficient storage allows the 
network to continue storing data for a longer time 
without exhausting storage space; (2) Load balanc- 
ing is possible: if the rate of data generation is not 
uniform at the sensors (e.g., in the case where a lo- 
calized event causes neighboring sensors to collect 
data more aggressively), some sensors may run out 
of storage space while space remains available at oth- 
ers. In such a case, it is important for the sensors to 
collaborate to achieve load balancing for storage to 
avoid or delay data loss due to insufficient local stor- 
age; and (3) Dynamic, localized reconfiguration of 
the network (such as adjusting sampling frequencies 
of sensors based on estimated data redundancy and 
current resources). 

We describe a cluster-based collaborative storage 
approach and compare it thi^ough simulations to a 
local buffering technique. Our experiments show 
that collaborative storage makes more efficient use 
of sensor storage and provides load balancing, espe- 
cially if a high level of spatial correlation among the 
data of neighboring sensors is present. The trade-off 
is that using collaborative storage, data need to be 
communicated among neighboring nodes, and thus 
collaborative storage expends more energy than lo- 
cal buffering. However, since data is aggregated us- 
ing collaborative storage, a smaller amount of data 
is stored and a smaller amount of data is eventually 
relayed to the observer, thereby reducing energy dis- 
sipation in this phase of operation. We then explore 
the use of coordination for redundancy control. More 
specifically, the cluster head can evaluate the amount 
of redundancy present among neighboring sensors, 
and use feedback this information back to the sen- 
sors to adjust their sampling rate. We exploit coor- 
dination in conjunction with local storage as well as 



collaborative storage and show that it provides desir- 
able properties in both cases. 

The remainder of this paper is organized as fol- 
lows. Section|2loverviews the partitioned sensor net- 
work problem and motivates collaborative storage in 
more detail in the context of this problem. Section |3l 
provides an overview of related work in this area. 
SectionUpresents the proposed storage management 
protocols and discusses the important design trade- 
offs. In section|5]we evaluate the storage alternatives 
under different scenarios. Finally Section |6lpresents 
conclusion and our future research. 

2 Motivation 

In this section, we describe two different ap- 
plications which require in-network storage. Ze- 
braNet 0, is a sensor network application for wild- 
life tracking whose goal is to provide more insight 
into complex issues such as migration patterns, so- 
cial structures and mobility models of various ani- 
mal species. In this application, sensors are attached 
to animals. Scientists (aka observers) collect the data 
by driving around the monitored habitat receiving in- 
formation from Zebras as they come in range with 
them. Data collection is not preplanned: it might 
be unpredictable and infrequent. The sensors do not 
have an estimate regarding the observer's schedule. 
The observer would like the network to maintain all 
the new data samples available since the last time the 
data was collected. Further, we would like the col- 
lection time to be small since the observer may not 
be in range with the zebra for a long time. 

The second example application is a Remote Eco- 
logical Micro-Sensor Network |[T2l aimed at remote 
visual surveillance of federally listed rare and endan- 
gered plants. This project aims to provide near-real 
time monitoring of important events such as visita- 
tion by pollinators and consumption by herbivores 
along with monitoring a number of weather condi- 
tions and events. Sensors are placed in different habi- 
tats, ranging from scattered low shrubs to dense trop- 
ical forests. Environmental conditions can be severe; 
e.g., some locations frequently freeze. In this appli- 
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cation, network partitioning (relay nodes becoming 
unavailable) may occur due to the extreme physi- 
cal conditions (e.g., deep freeze). Important events 
that occur during disconnection periods should be 
recorded and reported once the connection is reestab- 
lished. Effective storage management is needed to 
maximize the partitioning time that can be tolerated. 

3 Related Work 

Because of the wireless nature of sensors, the pri- 
mary resource constraint is the limited battery en- 
ergy available. Energy-awareness permeates all as- 
pects of sensor design and operation, from the phys- 
ical design of the sensor Q |3l to the design of its 
operating system |6||, communication protocols and 
applications |[T6l . 

Ratnasamy et al. propose using Data Centric Stor- 
age (DCS) to store data by name within a sen- 
sor network such that all related data is stored at 
the same (or nearby) sensor nodes using geographic 
hashing LI IJ . Thus, queries for data of a certain 
type are likely to be satisfied by a small number of 
nodes, significantly improving the performance of 
queries. However, this enhanced query performance 
requires moving related data from its point of gener- 
ation to its appropriate keeper as determined by geo- 
graphic hashing. We view this work as a higher level 
management of data focusing on optimizing queries 
rather than storage: our approach could compliment 
DCS by providing more effective storage of the data 
as it is collected. 

Concurrently with us f]A\, Ganesan et al have ex- 
plored protocols for storage constrained sensor net- 
works H. The work by Ganesan et al considers 
the same problem and explores some of the solu- 
tion space we are considering. Our work differs in 
the following ways: (1) We explore additional ap- 
proaches to storage management, including those us- 
ing coordination; (2) we explore issues that arise due 
to uneven data generation (e.g., due to event driven, 
or adaptive sampling applications), and non-uniform 
storage distribution (e.g., due to non-uniform deploy- 
ment of the sensors). In such applications, effective 
load balancing is required; and (3) we study some 



additional characteristics of the storage protocols in- 
cluding coverage and collection time and energy. 

4 Storage Management Protocols 

A primary objective of storage management pro- 
tocols is to efficiently utilize the available storage 
space to continue collecting data for the longest pos- 
sible time without losing samples in an energy effi- 
cient way. Storage management approaches can be 
classified as: 

1. Local storage: This is the simplest solution 
where every sensor stores its data locally. This 
protocol is energy efficient during the storage 
phase since it requires no data communication. 
Even though the storage energy is high (due to 
all the data being stored), the current state of 
technology is such that storage costs less than 
communication. However, this protocol is stor- 
age inefficient since the data is not aggregated 
and redundant data is stored among neighboring 
nodes. Local storage is unable to load balance 
if data generation or the available storage varies 
across sensors. 

2. Collaborative storage: Collaborative storage 
refers to any approach where nodes collaborate. 
Collaboration leads to two benefits: (1) Less 
data is stored: measurements obtained from 
nearby sensors are typically coiTclated. This al- 
lows data samples from neighboring sensors to 
be aggregated; and (2) Load balancing: collab- 
oration among sensors allows them to load bal- 
ance storage. 

It is important to consider the energy implications of 
collaborative storage relative to local storage. Col- 
laborative storage requires sensors to exchange data, 
causing them to expend energy during the storage 
phase. However, because they are able to aggregate 
data, the energy expended in storing this data to a 
storage device is reduced. In addition, once connec- 
tivity with the observer is established, less energy is 
needed during the collection stage to relay the stored 
data to the observer. We note that this holds true 
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even if in-network aggregation is carried out for lo- 
cally buffered data during the reach-back stage due 
to the following two reasons: (1) Initial communica- 
tion (first hop) of the locally buffered data will not 
be aggregated; and (2) Less efficient aggregation: a 
smaller amount of time and resources is available 
when near real-time data aggregation is applied dur- 
ing reach-back as compared to aggregation during 
the storage phase. Aggregating data during reach- 
back is limited because all the data collected during 
the storage phase is compressed in a short time. 

4.1 Collaborative Storage Protocols 

Within the space of collaborative storage, a num- 
ber of protocols are possible. The primary proto- 
col we study is Cluster Based Collaborative Storage 
(CBCS). CBCS uses collaboration among nearby 
sensors only: these have the highest likelihood of 
correlated data and require the least amount of en- 
ergy for collaboration. We did not consider wider 
collaboration because the collaboration cost may be- 
come prohibitive; the cost of communication is sig- 
nificantly higher than the cost of storage under cur- 
rent technologies. The remainder of this section de- 
scribes CBCS operation. 

In CBCS, clusters are formed in a distributed 
connectivity-based or geographically-based fashion 
- almost any one-hop clustering algorithm would 
suffice. Each sensor sends its observations to the 
elected Cluster Head (CH) periodically. The CH 
then aggregates the observations and stores the ag- 
gregated data. Only the CH needs to store aggregated 
data, thereby resulting in low storage. The clusters 
are rotated periodically to balance the storage load 
and energy usage. Note that only the CH needs to 
keep its radio on during its tenure, while a cluster 
member can turn off its radio except when it has data 
to send. This results in high energy efficiency: idle 
power consumes significant energy in the long run if 
radios are kept on. The reception of unneeded pack- 
ets while the radio is on also consumes energy. 

Operation during CBCS can be viewed as a contin- 
uous sequence of rounds until an observer/base sta- 
tion is present and the reach-back stage can begin. 
Each round consists of two phases: (1) CH Election 



phase: In this phase, each sensor advertises its re- 
sources to its one hop neighbors. Based on this re- 
source information a cluster head (CH) is selected 
The remaining nodes then attach themselves to that 
CH during the data transfer phase; and (2) Data ex- 
change phase: If a node is connected to a CH, it sends 
its observations to the CH; otherwise, it stores its ob- 
servations locally. 

The CH election approach used in CBCS is based on 
the characteristics of the sensor nodes such as avail- 
able storage, available energy or proximity to the 
"expected" observer location. The criteria for CH 
selection can be arbitrarily complex; in our experi- 
ments we used available storage as the criteria. 

There has been considerable research in cluster 
formation algorithms for MANETs that considered 
both static and dynamic cluster head election. Our 
requirements for the clustering algorithm are that it 
be light-weight and localized - only one-hop clus- 
ters. Moreover, we require cluster head rotation for 
load balancing of energy and storage. This is an idea 
borrowed from the LEACH protocol Q. The ap- 
proach we use is a representative one and there is 
room for future improvements in this aspect of the 
protocol. 

CH rotation is done by repeating the cluster elec- 
tion phases with every round. The frequency of clus- 
ter rotation influences the performance of the pro- 
tocol. Depending on the cluster formation criteria, 
there is an overhead for cluster formation due to the 
exchange of messages. 

The cluster election approach above may result in 
a situation where a node A, selects a neighbor B to 
be its CH when B itself selects C (which is out of 
range with A) to be its own CH. This may result in 
chains of cluster heads leading to ineffective/multi- 
hop clustering. To eliminate the above problem and 
restrict clusters to one hop, geographical zoning is 
used: an idea that is similar to the approach of con- 
structing virtual grids fl31l . More specifically, the 
sensor field is divided into zones such that all nodes 
within a zone are in range with each other Cluster 
selection is then localized to a zone such that a node 
only considers cluster advertisements occumng in its 
zone. Only one CH is selected per zone, eliminat- 
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ing CH chaining as discussed above. We note that 
this approach requires either pre-configuration of the 
sensors or the presence of a location discovery mech- 
anism (GPS cards or a distributed localization algo- 
rithm 12]). In sensor networks, localization is of fun- 
damental importance as the physical context of the 
reporting sensors must be known in order to interpret 
the data. We therefore argue that our assumption that 
sensors know their physical co-ordinates is realistic. 

4.2 The Role of Coordination 

One idea we explore is coordination among the sen- 
sors. Specifically, each sensor has a local view of the 
phenomenon, but cannot assess the importance of its 
information given that other sensors may report cor- 
related information. For example, in an application 
where 3 sensors are sufficient to triangulate a phe- 
nomenon, 10 sensors may be in a position to do so 
and be storing this information locally or sending it 
to the cluster head for collaborative storage. Through 
coordination, the cluster head can inform the nodes 
of the degree of the redundancy allowing the sen- 
sors to alternate triangulating the phenomenon. Co- 
ordination can be canied out periodically at low fre- 
quency, with a small overhead (e.g., with CH elec- 
tion). Similar to CH election, the nodes exchange 
meta data describing their reporting behavior and we 
assume that some application specific estimate of re- 
dundancy is performed to adjust the sampling rate. 

As a result of coordination, it is possible that a sig- 
nificant reduction in the data samples produced by 
each sensor is achieved. We note that this reduction 
represents a portion of the reduction that is achieved 
from aggregation. For example, in a localization ap- 
plication, with 10 nodes in position to detect an in- 
truder, only 3 nodes are needed. Coordination allows 
the nodes to realize this and adjust their reporting 
so that only 3 sensors produce data in every period. 
However, the three samples can still be aggregated 
into the estimated location of the intruder once the 
values are combined at the cluster head. 

Coordination can be used in conjunction with lo- 
cal storage or collaborative storage. In Coordinated 
Local Storage (CLS), the sensors coordinate period- 
ically and adjust their sampling schedules to reduce 



the overall redundancy, thus reducing the amount of 
data that will be stored. Note that the sensors con- 
tinue to store their readings locally. Relative to Local 
Storage (LS), CLS results in a smaller overall stor- 
age requirements and savings in energy in storing the 
data. This also results in a smaller and more energy 
efficient data collection phase. Similarly, Coordi- 
nated Collaborative Storage (CCS) uses coordination 
to adjust the sampling rate locally. Similar to CBCS, 
the data is still sent to the cluster head where aggre- 
gation is applied. However, as a result of coordina- 
tion, a sensor can adapt its sampling frequency/ data 
resolution to match the application requirements. In 
this case, the energy in sending the data to the clus- 
ter head is reduced because of the smaller size of the 
generated data, but the overall size of the data is not 
reduced. We evaluate CLS and CCS compared to the 
non-coordinated counterparts, LS and CBCS. 

5 Experimental Evaluation 

We simulated the proposed storage management 
protocols using the NS-2 simulator 1101 . We use a 
CSMA based MAC layer protocol. A sensor field 
of 350 X 350 meters^ is used with each sensor hav- 
ing a transmission range of 100 meters. We consid- 
ered three levels of sensor density: 50 sensors, 100 
sensors and 150 sensors deployed randomly. We di- 
vide the field into 25 zones (each zone is 70 x 70 
meters^ to ensure that any sensor in the zone is in 
range with any other sensor). The simulation time for 
each scenario was set to 500 seconds and each point 
represents an average over five different topologies. 
Cluster rotation and coordination are performed ev- 
ery 100 seconds in the appropriate protocols. 

We assume sensors have a constant sampling rate 
(set to one sample per second). Unless otherwise in- 
dicated, we set the aggregation ratio to a constant 
value of 0.5. For the coordination protocols, we used 
a scenario where the available redundancy was on 
average 30% of the data size - this is the percentage 
of the data that can be eliminated using coordination. 
We note that this reduction in the data size represents 
a portion of the reduction possible using aggrega- 
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tion. With aggregation the full data is available at 
the cluster head and can be compressed at a higher 
efficiency. Several sensor nodes that are appearing 
on the market, including Berkeley MICA nodes |91 
have Flash memories. Flash memories have excel- 
lent power dissipation properties and small form fac- 
tor. As a representative we consider a SimpleTech 
flash memory USB cards lITSl with as Transfer En- 
ergy/Mbyte 0.055 J. In current wireless communica- 
tion technologies (Radio Frequency based), the cost 
of communication is high compared to the cost of 
storage. For example, representative radios follow- 
ing the Zigbee IEEE 802.15.4 standard consume en- 
ergy at roughly 40 times the cost of the SimpleTech 
USB card above per unit data. Our energy models in 
the simulation are based on these two devices. Fur- 
ther, we adjust the radio properties to match those of 
a Zigbee device. 

Note that the possible data aggrega- 
tion/compression as well as the reduction due 
to coordination are application as well as topol- 
ogy dependent. Consider a temperature sensing 
application. For this application a given sensor 
can collect data from all its neighbors and then 
simply take the average and store a single value (or 
maybe minimum, mean and maximum values) as 
representative. However, if the sensors are sending 
video data, then such high spatial compression 
might not be possible. In this paper, instead of 
considering a specific application, we assume a data 
aggregation model where the cluster head is able 
to compress the size of the data by an aggregation 
ratio a. By controlling a we can consider different 
applications with different levels of available spatial 
con^elation. In this model, the size of the aggregated 
data grows linearly with the number of available 
sensors. We consider the implications of this model 
on collaborative storage and explore other possible 
models later in this section. We would like to 
emphasize that we have selected these numbers 
as just representatives to illustrate the the various 
tradeoffs. Due to space constraints we can not 
include the results comparing all these protocols 
with various values of aggregation ratio. 
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Figure 1: Storage space vs. Network Density 



5.1 Storage and Energy Tradeoffs 

Figure[2shows the average storage used per sensor as 
a function of the number of sensors (50, 100 and 150 
sensors) for the four storage management techniques 
: (1) local storage (LS); (2) Cluster-Based Collabora- 
tive Storage (CBCS); (3) Coordinated Local Storage 
(CLS); and (4) Coordinated Collaborative Storage 
(CCS). In the case of CBCS aggregation ratio was set 
to 0.5. The storage space consumption is indepen- 
dent of the density for LS and is greater than storage 
space consumption than CBCS and CCS (roughly in 
proportion to the aggregation ratio). CLS storage re- 
quirement is in between the two approaches because 
it is able to reduce the storage requirement using co- 
ordination (we assumed that coordination yields im- 
provement uniformly distributed between 20% and 
40%). Note that after data exchange, the storage re- 
quirement for CBCS and CCS are roughly the same 
since aggregation at the cluster head can reduce the 
data to a minimum size, regardless of whether coor- 
dination took place or not. 

Surprisingly, in the case of collaborative storage, 
the storage space consumption decreases slightly as 
the density increases. While this is counter-intuitive, 
it is due to higher packet loss observed during the ex- 
change phase as the density increases; as density in- 
creases, the probability of collisions increases. These 
losses are due to the use of a contention based unre- 
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liable MAC layer protocol: when a node wants to 
transmit its data to the CH. The negligible difference 
in the storage space consumption between CBCS and 
CCS is also an artifact slight difference in the num- 
ber of collisions observed in the two protocols. The 
use of a reliable protocol such as that in IEEE 802. 1 1 
or a reservation based protocol such as the TDMA 
based protocol employed by LEACH [5] can be used 
to reduce or eliminate losses due to collisions (at an 
increased communication cost). We leave the ex- 
ploration of these tradeoffs to future work. Packet 
loss ranged from around 1% for the 50 sensor case to 
around 10% for the 150 sensor scenarios. Regai^dless 
of the effect of collisions, one can clearly see that the 
collaborative storage achieves significant savings in 
storage space compared to local storage protocols (in 
proportion to the aggregation ratio). 

Figure |2l shows the consumed energy for the pro- 
tocols in Joules as a function of network density. 
The X-axis represents protocols for different net- 
work densities: L and C stand for local buffering 
and CBCS respectively. L-l,L-2,and L-3 represents 
the results with local buffering technique for network 
size 50,100 and 150 respectively. The energy bars 
are broken into two parts: pre-energy, which is the 
energy consumed during the storage phase, and post- 
energy, which is the energy consumed during data 
collection (the relaying of the data to the observer). 
The energy consumed during storage phase is higher 
for collaborative storage because of the data com- 
munication among neighboring nodes (not present 
in local storage) and due to the overhead for clus- 
ter rotation. CCS spends less energy than CBCS due 
to reduction in data size that results from coordina- 
tion. However, CLS has higher expenditure than LS 
since it requires costly communication for coordina- 
tion. This cost grows with the density of the network 
because our coordination implementation has each 
node broadcasting its update and receiving updates 
from all other nodes. 

For the storage and communication technologies 
used, the cost of communication dominates that of 
storage. As a result, the cost of the additional com- 
munication during collaborative storage might not be 
recovered by the reduced energy needed for storage 
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Figure 2: Energy consumption vs. Density 



except at very high compression ratios. This trade- 
off is a function of the ratio of communication cost 
to storage cost; if this ratio goes down in the future 
(for example, due to the use of infra-red communi- 
cation or ultra-low power RF radios), collaborative 
storage becomes more energy efficient compared to 
local storage. Conversely, if the ratio goes up, col- 
laborative storage becomes less efficient. 

The data collection model depends on the appli- 
cation and network organization; several models are 
in use for deployed sensor networks. We use a sim- 
ple collection model where we only account for the 
cost of transferring the data one hop. This model is 
representative of an observer that moves around and 
gather data from the sensors. Also, in cases where 
the local buffering approach carries out aggregation 
at the first hop towards the observer, the size of the 
data becomes similar in the two approaches and the 
remainder of the collection cost is the same. How- 
ever, this is slightly optimistic in favor of local stor- 
age because near real-time data aggregation will not 
in general be able to achieve the same aggregation 
level during collection as is achieved during collab- 
orative storage. This is due to the fact that collab- 
orative storage can afford to wait for samples and 
compress them efficiently. Moreover, in collabora- 
tive storage, the aggregation is done incrementally 
over time, requiring fewer resources than aggrega- 
tion during collection where large amounts of data 
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Collection Time as a function of Network Size 
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Figure 3: Mean collection time vs. Density 



are processed during a short time period. The collab- 
orative storage approaches outperform the local stor- 
age ones according to this metric due to their smaller 
storage size. CLS outperforms LS for the same rea- 
son. 

Figurel^Jshows that with collaborative storage, the 
collection time is considerably lower than that of 
local buffering. In addition, CLS outperforms LS. 
Low collection time and energy are important pa- 
rameters from a practical standpoint. After exploring 
the effect of coordination, the remainder of the paper 
presents results only with the two uncoordinated pro- 
tocols (LS and CBCS). 

5.2 Storage Balancing Effect 

In this study, we explore the load-balancing effect of 
collaborative storage. More specifically, the sensors 
are started with a limited storage space and the time 
until this space is exhausted is tracked. We consider 
an application where a subset of the sensors gener- 
ates data at twice the rate of the others, for exam- 
ple, in response to higher observed activity close to 
some of the sensors. To model the data correlation, 
we assume that sensors within a zone have con^elated 
data. Therefore all the sensors within a zone will re- 
port their readings with the same frequency. We ran- 
domly select zones with high activity. ; sensors within 
those zones will report twice as often as those sensors 
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Figure 4: Percentage of storage depleted sensors as a 
function of time. 



within low activity zone. 

In Figure |3 the X-axis denotes time (in multiples 
of 100 seconds), whereas the Y-axis denotes the per- 
centage of sensors that have no storage space left. 
Using LS, in the even data generation case, all sen- 
sors run out of storage space at the same time and all 
data collected after that is lost. In comparison, CBCS 
provides longer time without running out of storage 
because of its more efficient storage. 

The uneven data generation case highlights the 
load-balancing capability of CBCS. Using LS, the 
sensors that generate data at a high rate exhaust their 
storage quickly; we observe two subsets of sensors 
getting their storage exhausted at two different times. 
In comparison, CBCS has much longer mean sensor 
storage depletion time due to its load balancing prop- 
erties, with sensors exhausting their resources grad- 
ually, extending the network lifetime much longer 
thanLS. 



5.3 Coverage Analysis 

Physically co-located sensors have redundant data. 
For simplicity, we assume that all sensors within a 
zone have correlated data. In this work we consider 
two types of coverage, namely, binary coverage and 
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Binary Coverage versus time 




Time (in multiple of 100 seconds) 



Figure 5: Binary Coverage 



manifold coverage, defined as follows: (1) Binary 
Coverage: A given zone Z-i is said to be covered at 
time t if any one of the sensors S\ . . .Sk in Zi is 
reporting and storing the reading. Binary coverage 
can be visualized as a step function; and (2) Mani- 
fold Coverage: A given zone Zi is said to be cov- 
ered at time t proportional to the number of sensors 
3 ij < k) out of its given set of sensors Si . . . Sk 
that are reporting and storing the reading. This cov- 
erage function can be visualized as a monotonically 
increasing function (which might have diminishing 
returns after some point). This means that the higher 
the number of reporting sensors the better the cover- 
age. 

Figure |5] shows the Binary Coverage as a function 
of time. One can see that CBCS has a higher per- 
centage of active zones compared to LS for both data 
generation models. 

Similar trends are seen when considering Mani- 
fold Coverage (Figure |6ll. Each line represents the 
percentage of zones with some specific coverage 
level: for example, the line "quarter" represents the 
percentage of zones where at least 25% of the sen- 
sors have storage space left. One can clearly see 
that, in the case of LS with even data generation 
(Figure |6(a)| i the percentage of zones with full cov- 
erage is 100% at 300 seconds, whereas with uneven 



data generation it reduces to less than 50% within 
300 seconds. In CBCS, at the same times, the cov- 
erage is around 96% with the even data generation 
model and with the uneven data model it is around 
77%. Note that, A CH stored more data than in- 
dividual sensor, therefore if the round time is very 
long, it might happen that the given CH runs out of 
storage sooner than a sensor storing its data locally. 
In LS, the percentage of dead zones (zones with all 
sensors out of storage space) rises in two waves for 
the uneven data model, reaching up to 30% within 
300 seconds and 50% in 500 seconds However, with 
CBCS, with the uneven data model, the percentage 
of dead zones rises slowly and is below 30% even at 
the end of the simulation. In general, from these fig- 
ures, one can see that the manifold coverage changes 
are abrupt for local buffering. In contrast, collabora- 
tive storage provide smooth degradation of coverage. 
Moreover, the average coverage is higher for collab- 
orative storage due to the data aggregation and load 
balancing ability, by transferring data from high ac- 
tivity zones to low activity zones. 

5.4 Effect of the Aggregation Model 

One limitation of the aggregation model we have 
used so far is that the required storage size under 
collaboration grows in direct proportion to the num- 
ber of sensors in the cluster; that is, the storage con- 
sumed in a round is aN ■ D, where a is the aggre- 
gation ratio, is the number of sensors and D is the 
data sample size. Since the available storage {N ■ S, 
where S is the available storage per sensor) is also 
a function of the number of sensors, storage is con- 
sumed at a rate (fracaDS) which is independent of 
the number of sensors present in the zone, assuming 
perfect load balancing. For most applications, this 
will not be the case: the aggregated data necessary to 
describe the phenomenon in the zone does not grow 
strictly proportionately to the number of sensors and 
we expect storage lifetime to be longer in dense areas 
than in sparse ones. 

To highlight the above effect, we consider the case 
of a biased deployment where sensors are deployed 
randomly but with non-uniform density. In addition 
to the aggregation model considered so far, we con- 
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sider a case where the CH upon receiving packets 
from its N members, just needs to store 1 packet. As 
an example if the aggregation function is to store the 
average value of the N samples (e.g. average tem- 
perature reading). Cleai^ly, in the second case, the 
size of the aggregated data is independent of network 
density. We now study how these applications with 
different aggregation functions perform on top of a 
biased deployment. To model biased deployment, we 
consider 4 zones with 5,4,3,2 sensors respectively. In 
these simulations, the round time was set to 10 sec- 
onds (CH selection happens every 10 seconds). 

In Figure the X-axis shows time (in multiple of 
10 seconds), whereas the Y-axis shows the percent- 
age of coverage sensors within a given zone. As de- 
scribed earlier we considered 4 zones for this study 
and each line in the Figure represents a particular 
zone. For example line Z-5 stands for a zone with 5 
sensors in it and Z-2 denotes the zone with 2 sensors 



in it and so on. As shown in Figure 7(a) when the ag- 
gregation ratio is a constant (0.5), all the zones pro- 
vide coverage for almost same duration. However, in 
the second case, as shown if Figure |7(b)] coverage is 
directly proportional to the network density, higher 
the density, longer the coverage. 

The sensor network coverage from a storage man- 
agement perspective depends on the event generate 
rate, the aggregation properties as well as the avail- 
able storage. If the aggregated data size is indepen- 
dent of the number of sensors (or grows slowly with 
it), the density of the zone con^elates with the avail- 
ability of storage resources. Thus, both the availabil- 
ity of storage resources as well as the consumption of 
them may vary within a sensor network. This argues 
for the need of load-balancing across zones to pro- 
vide long network lifetime and effective coverage. 
This is a topic of future research. 

6 Conclusion and Future Work 

In this paper, we considered the problem of storage 
management in sensor networks where the data is not 
continuously reported in real-time and must there- 
fore be stored within the network. Collaborative stor- 
age is a promising approach for storage management 



because it enables the use of spatial data aggregation 
between neighboring sensors to compress the stored 
data and optimize the storage use. Collaborative stor- 
age also allows load balancing of the storage space to 
allow the network to maximize the time before data 
loss due to insufficient memory. Collaborative stor- 
age results in lower time to transfer the data to the ob- 
server during the reach-back stage and better binary 
and manifold coverage than a simple local buffering 
approach. Finally, we explored the use of coordina- 
tion to cut down on redundancy at the source sensors, 
resulting in an improved version of both local storage 
and collaborative storage. 

While collaborative storage reduces the energy re- 
quired for storage, it requires additional communica- 
tion. Using current technologies, collaborative stor- 
age requires more energy than local buffering. Net- 
work effectiveness is bound both by storage avail- 
ability (to allow continued storage of collected data) 
as well as energy. Thus, protocol designers must be 
careful to balance these constraints: if the network is 
energy constrained, but has abundant storage, local 
storage is most efficient from an energy perspective. 
Alternatively, if the network is storage constrained, 
collaborative storage is most effective from a stor- 
age perspective. When the network is consti'ained 
by both, a combination of the two approaches would 
probably perform best. 

As part of our future research, we would like to 
implement these protocols on real sensor hardware 
platforms such as the Berkeley motes. Furthermore, 
in this study we consider all events to be of the 
same importance, and thus they are stored with the 
same compression ratio (resolution). In our future 
research, we will explore the protocol space wherein 
different events are stored with different resolutions 
(important events are stored in detail whereas unim- 
portant events are stored with a coarser granularity). 
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(a) LS Manifold Coverage (Even Data Generation) 



(b) CBCS Manifold Coverage (Even Data Gener- 
ation) 
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(c) LS Manifold Coverage (Uneven Data Genera- 
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(d) CBCS Manifold Coverage (Uneven Data Gen- 
eration) 



Figure 6: Manifold Coverage 
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Biased Deployment compression ratio = 0.5 



Time (in multiple of 10 seconds) 



Biased Deployment compression ratio = 1/n 



Time (in multiple of 10 seconds) 



(a) Aggregation ratio = 0.5: Coverage 



(b) Aggregation Ratio = Coverage 



Figure 7: Biased Deployment versus coverage study. 
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